When one tries to find a house to live, there will be many things to consider. Those include everything you can think of from the basic factors like rent to the specific factors like whether there are schools nearby. It will be a headache for one to start finding a house by himself and collect all such information. Thus we propose an application which will recommend a place for you based on your information. If you provide us with your information like age, race or income etc., we will recommend places where you will find people similar to you. For the application, it will be important to choose what features we will use in order to give recommendations. In this project, we will visualize and analyze features that we think will be useful for the application. We will look at census data with features like rent price, income, age, education, etc. and also facilities information like location of police stations, schools, etc.
Here is a video explains how we want the application to work:
We want to discover several features that might be useful in the application of providing personalized housing recommendation. The features we want to discover are rent, race, age, education, marital status, income, employment and occupation. The data is collected from the United States Census Bureau. They are all 2010-2014 American Community Survey 5-Year Estimates. The data is collected in two geographical scope. The large scope is in the level of states and the small scope is in the level of census tract. A series of data aggragation are performed and we finally got two files one represents the features in the scope of states (whole US) and the other represents the features in the scope of census tracts (New York City). The specific measure of each feature we are using are median rent, median_age, percentage of people with a degree of high school or higher, married ratio, median_income and unemployment rate. These features will be displayed in the following interactive maps. Categorical features like race will also be discussed later.
We use point data of police stations, public schools, parks, and restaurants. These data are gotten from following data sources.
First of all, we would like to have a general idea about the demographic features of the United States. And then we are going to look into lower geographic levels for specific areas.
So here we list the top 10 states’ population distribution by gender.
From the population plot, we can see that California has a overwhelming large population. The second one is Texas, and New York has a slighly larger population than Florida. We are going to focuse on New York later for our analysis.
We chose the two states Texas and Vermont to compare their age structure differences and try to figure out why Texas has the second most population while Vermont has the second least population. From the plot, we can clearly conclude that Texas has more percentages of people at each age group under 40 and less percentages of people at each age group above 50. This means that Vermont is more of an aging society which may not be a perfect place for young people to live in.
We want to analyze for each education level, which state will be of the highest median income. If you are of bachelor or higher degrees, your best choice is District of Columbia, which is not surprising. However, if your degree level is lower than high school, you may want to think about living in New Hampshire. If you are a high school graduate, Alaska has the highest median income for that degree. If you are of a college or associate degree, Maryland has the highest median income for you.
| highest median income state | |
|---|---|
| overall | District of Columbia |
| lower than high school | New Hampshire |
| high school graduate | Alaska |
| college or associate degree | Maryland |
| bachelor degree | District of Columbia |
| graduate or professional degree | District of Columbia |
Considering most of us are of bachelor or higher degees, we want to look closer to those two degrees and see the difference of median income within each state.
The overall trend of median income between bachelor and graduate degree within each state is basically the same. People of graduate or professional degree have significantly higher income than people of bachelor degree. For graduate or professional degree, the top five states of highest median income are District of Columbia, New Jersey, Maryland, Virginia, and California. For bachelor degree, the top five states of highest median income are District of Columbia, New Jersey, Connecticut, Maryland, Massachusetts. Sadly, New York state is at 9th position for graduate degree and 8th position for bachelor degree.
We then want to analyze the marital status of four states which are two typical West Coast and East Coast States: California and New York, and two states that have the highest and lowest median income: District of Columbia and Montana. We can find out that despite of their physical distances, California and New York have very similar marital structure. Their raddar plots are almost the same. On the other hand, District of Columbia and Montana have almost the opposite marital structure. District of Columbia has an extremely high rate of never married, while Montana has a lower rate of never married but higher rate of married and divorced. So we may conclude that income may affect people’s decisions about marriage.
After analyzing the state level data, we are going to focus on the New York City area. Let’s first take a look at the county level features.
There are 5 counties in the New York City: New York County (Manhattan), Kings County (Brooklyn), Queens County (Queens), Bronx County (Bronx), Richmond County (Staten Island). To be consistent with the common knowledge, we decide to use the informal county names in the following graphs.
First, let’s look at the key factor: rent price. As non-cash rent only represents about 5% of the total households, we only show the cash rent here. The following circus plot is divided into two half spaces: the 5 county levels in the bottom half, and all cash rent ranges on the top half. The band width represents the number of households falling into the corresponding category.
By looking at the bottom half, we can see the general population distribution of the NYC across the 5 counties. There are the least number of households living on the Staten Island. Brooklyn and Manhattan are the two largest counties in the NYC. It is very clear that the price about 80% of the households in Bronx has a rent less than $1,500. All the other counties tend to have more balanced distributions across the rent ranges.
Then by looking at the top half, we can see that the most common rent range is $1,000-$1,499. Most of such places are in Bronx, Brooklyn and Queens. On the contrary, if we look at the $2,000+ rent range, we can see that about 2/3 of such places are in Manhattan. For the other rent ranges, there is no clear dominant county.
This shows the fact that Manhattan is a much expensive to live in. It seems that Queens also has relatively high average rent prices. One possible reason could be that there are more married couples and families with kids living in this area than nearby counties, say Brooklyn.
Cash Rent by County for NYC
Second, we are going to analyze some other features in NYC using the same method. The following 4 circos plots show the different demographic features across the 5 counties.
The upper-left graph shows the population compositions by gender by age. It is clearly that there generally are more female than male in the NYC. Although from the previous graph of rent we know that there are a lot of households in Manhattan, on this graph we see that Manhattan is taking a smaller portion. This is because that the first 3 graphs below are based on population. Thus, it suggest the fact that there are less people living in the same house/apartment in Manhattan than the other counties.
The upper-right graph shows the education levels. Manhattan has a significantly larger portion of population with bachelor’s degrees or higher. If we look at the Graduate degree category, we can see that Manhattan has more than half of the population.
Age by County for NYC Education by County for NYC
Race by County for NYC Income by County for NYC
The lower-left graph shows the race composition. We can see that about half of the black population live in Brooklyn; about half of the Asian population live in Queens. Except for Staten Island, other counties’ population distribution pattern is generally the same as the big picture.
The lower-right graph is about annual income household. Similar to the rent pattern, Manhattan tends to have higher numbers, while Bronx tends to have lower numbers.
The above demographic feature summaries can be very helpful when people try to decide which area to live. We also show such information at even lower geographic levels and on the map.
After state and county level, we are moving to the very zoomed-in census tract level. We also have all the demographic and economic factors available at this level. Here we only show one summary graph. Then we’ll show all the detailed information on the map.
The following bubble plot summarizes 4 pieces of information from the 2,101 census tracts in the NYC. The data shown here are: County Name (color), Population (size), Average Annual Income (X axis) and Median Monthly Rent (Y axis). We cut the X axis at $250,000, but we did not cut the Y axis. Because the last category of the monthly rent is “$2,800+”, there are many points on the $2,800 line. We keep them there to see which areas have more expensive rents. If you’d like to ignore them, you can just zoom in or select the area that you are interested in on the graph.
From the graph above, we can clearly see that there is a positive relationship between income and rent price. Most of the census tracts from Bronx are around the lower-left corner with both low income and low rent. On the other hand, many tracts in Manhattan and Staten Island are in the upper-right area with both high income and high rent. Census tracts from Queens and Brooklyn are mostly in the center area. Brooklyn has some census tracts on the two extremes.
After state, county and census tract level summaries, we are going to look at some of the point data we acquired in detail. Here is a map information categorized by mean income.
We can see the difference of income on each area. At Manhattan, the mean income is relatively high, and at some areas, the mean income is from 300,000 to 450,000. Here, we will compare the price levels of restaurants at several areas. We used restaruants information provided by Google Places API, and this API provides following price levels.
We will pick up Upper East Side and Harlem. Upper East Side is an example of the area with high rent(high mean income). Harlem is an example of the area with low rent(low mean income) in Manhattan.
We compare the price levels of restaurants in the Harlem area and Upper East Side in the following maps:
As we can see that in the map on the left-hand side (Harlem), the price level of restaurants at this area are relatively low. There is no restaurants which have price level 4. Most restaurants have price level 1. The average price level of restaurants at Harlem is 1.4084507.
On the contrary, in the map on the right-hand side (Upper East Side), the price levels of restaurants are relatively high. We can see that some restaurants have over price level 3. Even in Upper East Side area, there are some restaurants with price level 1. This is because chain restaurants such as Subway and Dunkin’ Donuts are also located in Upper East Side. The average price level of restaurants at Upper East Side is 2.0909091.
In the following interactive map, we aggregate all information mentioned before in one single map of the NYC. You can use the drop-down menu to select which feature you would like to see and zoom-in to your area of interest to have a better view of the facilities nearby.
In this project, we explore many possibilities in visualizing a variety of demographic features at different geographic levels. We believe these features are critical when people try to find a perfect place for them to live.
If someone is thinking about moving within the US, our goal is to show him/her all the necessary information that could be useful. For example, people with children may prefer locations with good schools or kindergartens; new immigrants may prefer places with a lot of people from the same background. We provide all such information in our static or interactive graphs and maps. The summary graphs at state and county level help the users gain a general knowledge about a bigger area of the US. After he/she decides which city or county that he/she is interested in, then we can show him/her the very detailed map and information even at block level. This project goes through this whole process from state data, to county data, census tract data and to local business and facility data. We believe that such information will save a lot of the users’ time and help them learn a lot about their potential neighborhood simply from our site. Of course, there are many other features that we can include in the real application to provide the users more information and help them make better decisions.